77 research outputs found
Consistency of Feature Markov Processes
We are studying long term sequence prediction (forecasting). We approach this
by investigating criteria for choosing a compact useful state representation.
The state is supposed to summarize useful information from the history. We want
a method that is asymptotically consistent in the sense it will provably
eventually only choose between alternatives that satisfy an optimality property
related to the used criterion. We extend our work to the case where there is
side information that one can take advantage of and, furthermore, we briefly
discuss the active setting where an agent takes actions to achieve desirable
outcomes.Comment: 16 LaTeX page
Coding of non-stationary sources as a foundation for detecting change points and outliers in binary time-series
An interesting scheme for estimating and adapting distributions in real-time for non-stationary data has recently been the focus of study for several different tasks relating to time series and data mining, namely change point detection, outlier detection and online compression/sequence prediction. An appealing feature is that unlike more sophisticated procedures, it is as fast as the related stationary procedures which are simply modified through discounting or windowing. The discount scheme makes older observations lose their influence on new predictions. The authors of this article recently used a discount scheme for introducing an adaptive version of the Context Tree Weighting compression algorithm. The mentioned change point and outlier detection methods rely on the changing compression ratio of an online compression algorithm. Here we are beginning to provide theoretical foundations for the use of these adaptive estimation procedures that have already shown practical promise
The Sample-Complexity of General Reinforcement Learning
We present a new algorithm for general reinforcement learning where the true
environment is known to belong to a finite class of N arbitrary models. The
algorithm is shown to be near-optimal for all but O(N log^2 N) time-steps with
high probability. Infinite classes are also considered where we show that
compactness is a key criterion for determining the existence of uniform
sample-complexity bounds. A matching lower bound is given for the finite case.Comment: 16 page
Concentration and Confidence for Discrete Bayesian Sequence Predictors
Bayesian sequence prediction is a simple technique for predicting future
symbols sampled from an unknown measure on infinite sequences over a countable
alphabet. While strong bounds on the expected cumulative error are known, there
are only limited results on the distribution of this error. We prove tight
high-probability bounds on the cumulative error, which is measured in terms of
the Kullback-Leibler (KL) divergence. We also consider the problem of
constructing upper confidence bounds on the KL and Hellinger errors similar to
those constructed from Hoeffding-like bounds in the i.i.d. case. The new
results are applied to show that Bayesian sequence prediction can be used in
the Knows What It Knows (KWIK) framework with bounds that match the
state-of-the-art.Comment: 17 page
Recommended from our members
A dual process theory of optimistic cognition
Optimism is a prevalent bias in human cognition including variations like self-serving beliefs, illusions of control and overly positive views of one's own future. Further, optimism has been linked with both success and happiness. In fact, it has been described as a part of human mental well-being which has otherwise been assumed to be about being connected to reality. In reality, only people suffering from depression are realistic. Here we study a formalization of optimism within a dual process framework and study its usefulness beyond human needs in a way that also applies to artificial reinforcement learning agents. Optimism enables systematic exploration which is essential in an (partially) unknown world. The key property of an optimistic hypothesis is that if it is not contradicted when one acts greedily with respect to it, then one is well rewarded even if it is wrong
Principles of Solomonoff Induction and AIXI
We identify principles characterizing Solomonoff Induction by demands on an
agent's external behaviour. Key concepts are rationality, computability,
indifference and time consistency. Furthermore, we discuss extensions to the
full AI case to derive AIXI.Comment: 14 LaTeX page
Axioms for Rational Reinforcement Learning
We provide a formal, simple and intuitive theory of rational decision making including sequential decisions that affect the environment. The theory has a geometric flavor, which makes the arguments easy to visualize and understand. Our theory is for complete decision makers, which means that they have a complete set of preferences. Our main result shows that a complete rational decision maker implicitly has a probabilistic model of the environment. We have a countable version of this result that brings light on the issue of countable vs finite additivity by showing how it depends on the geometry of the space which we have preferences over. This is achieved through fruitfully connecting rationality with the Hahn-Banach Theorem. The theory presented here can be viewed as a formalization and extension of the betting odds approach to probability of Ramsey and De Finetti [Ram31, deF37]
Feature reinforcement learning: state of the art
Feature reinforcement learning was introduced five years ago as a principled and practical approach to history-based learning. This paper examines the progress since its inception. We now have both model-based and model-free cost functions, most recently extended to the function approximation setting. Our current work is geared towards playing ATARI games using imitation learning, where we use Feature RL as a feature selection method for high-dimensional domains
- …